Loading optimizations by Cyrilvallez · Pull Request #36742 · huggingface/transformers

Cyrilvallez · 2025-03-15T14:08:29Z

What does this PR do?

Profiling of CPU memory usage showed that calling gc.collect() has no effect (the del statement is enough and cleaning will happen automatically as it should), i.e. the peak cpu memory usage is never larger than 1 state dict (even with older .bin sharded checkpoints).
Removing the statement speeds up loading by 15-20% as gc.collect() is costly and should in general not be called.

As for the caching allocator, profiling shows that having an allocation_factor > 1 does not help at all neither for TP nor usual loading on multiple GPUs. It is then much easier to always use 1, as we will never encounter cases of blowing GPUs. Also, code was not allocating anything in case allocation size was larger than gpu size, which could lead to big slowdown if reaching this state only because of the factor 2, as the model may still fit.
Also, improved granularity of the allocator by checking each param dtype, which may be different with composite models/keep_in_fp_32_modules.

github-actions · 2025-03-15T14:08:41Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

HuggingFaceDocBuilderDev · 2025-03-15T14:45:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/transformers/modeling_utils.py

improvements

51b34f0

github-actions bot marked this pull request as draft March 15, 2025 14:08

Cyrilvallez marked this pull request as ready for review March 15, 2025 14:18

github-actions bot requested review from ArthurZucker and Rocketknight1 March 15, 2025 14:18

Update modeling_utils.py

099df25

ArthurZucker approved these changes Mar 17, 2025

View reviewed changes

src/transformers/modeling_utils.py Show resolved Hide resolved

shethaadit approved these changes Mar 17, 2025

View reviewed changes

Cyrilvallez added 2 commits March 18, 2025 16:20

add some doc about loading

d06aada

Update modeling_utils.py

8d00574

Cyrilvallez merged commit db1d4c5 into main Mar 18, 2025
24 checks passed

Cyrilvallez deleted the loading-optimizations branch March 18, 2025 15:38

inf3rnus mentioned this pull request Apr 11, 2025

facebook/opt-30b Cuda Allocation Error with version >= 4.50.0 code #37436

Closed

ydshieh mentioned this pull request Jun 23, 2025

fix mistral and mistral3 tests #38978

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading optimizations#36742

Loading optimizations#36742
Cyrilvallez merged 4 commits intomainfrom
loading-optimizations

Cyrilvallez commented Mar 15, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Mar 15, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 15, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Cyrilvallez commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

github-actions bot commented Mar 15, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 15, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cyrilvallez commented Mar 15, 2025 •

edited

Loading